NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Zero-Shot Multi-Label Topic Inference with Sentence Encoders and LLMs

https://doi.org/10.18653/v1/2023.emnlp-main.1008

Sarkar, Souvika; Feng, Dongji; Karmaker_Santu, Shubhra Kanti (December 2023, Association for Computational Linguistics)
Bouamor, Houda; Pino, Juan; Bali, Kalika (Ed.)
In this paper, we conducted a comprehensive study with the latest Sentence Encoders and Large Language Models (LLMs) on the challenging task of “definition-wild zero-shot topic inference”, where users define or provide the topics of interest in real-time. Through extensive experimentation on seven diverse data sets, we observed that LLMs, such as ChatGPT-3.5 and PaLM, demonstrated superior generality compared to other LLMs, e.g., BLOOM and GPT-NeoX. Furthermore, Sentence-BERT, a BERT-based classical sentence encoder, outperformed PaLM and achieved performance comparable to ChatGPT-3.5.
more » « less
Full Text Available
Efficient k-NN Search with Cross-Encoders using Adaptive Multi-Round CUR Decomposition

https://doi.org/10.18653/v1/2023.findings-emnlp.544

Yadav, Nishant; Monath, Nicholas; Zaheer, Manzil; McCallum, Andrew (December 2023, Findings of the Association for Computational Linguistics: EMNLP 2023)
Bouamor, Houda; Pino, Juan; Bali, Kalika (Ed.)
Cross-encoder models, which jointly encode and score a query-item pair, are prohibitively expensive for direct k-nearest neighbor (k-NN) search. Consequently, k-NN search typically employs a fast approximate retrieval (e.g. using BM25 or dual-encoder vectors), followed by reranking with a cross-encoder; however, the retrieval approximation often has detrimental recall regret. This problem is tackled by ANNCUR (Yadav et al., 2022), a recent work that employs a cross-encoder only, making search efficient using a relatively small number of anchor items, and a CUR matrix factorization. While ANNCUR’s one-time selection of anchors tends to approximate the cross-encoder distances on average, doing so forfeits the capacity to accurately estimate distances to items near the query, leading to regret in the crucial end-task: recall of top-k items. In this paper, we propose ADACUR, a method that adaptively, iteratively, and efficiently minimizes the approximation error for the practically important top-k neighbors. It does so by iteratively performing k-NN search using the anchors available so far, then adding these retrieved nearest neighbors to the anchor set for the next round. Empirically, on multiple datasets, in comparison to previous traditional and state-of-the-art methods such as ANNCUR and dual-encoder-based retrieve-and-rerank, our proposed approach ADACUR consistently reduces recall error—by up to 70% on the important k = 1 setting—while using no more compute than its competitors.
more » « less
Full Text Available
Boosting Summarization with Normalizing Flows and Aggressive Training

https://doi.org/10.18653/v1/2023.emnlp-main.165

Yang, Yu; Shen, Xiaotong (August 2023, The Proceeding of the 2023 Conference on Empirical Methods in Natural Language Processing)
Bouamor, Houda; Pino, Juan; Bali, Kalia (Ed.)
This paper presents FlowSUM, a normalizing flows-based variational encoder-decoder framework for Transformer-based summarization. Our approach tackles two primary challenges in variational summarization: insufficient semantic information in latent representations and posterior collapse during training. To address these challenges, we employ normalizing flows to enable flexible latent posterior modeling, and we propose a controlled alternate aggressive training (CAAT) strategy with an improved gate mechanism. Experimental results show that FlowSUM significantly enhances the quality of generated summaries and unleashes the potential for knowledge distillation with minimal impact on inference time. Furthermore, we investigate the issue of posterior collapse in normalizing flows and analyze how the summary quality is affected by the training strategy, gate initialization, and the type and number of normalizing flows used, offering valuable insights for future research.
more » « less
Full Text Available

Search for: All records